It is difficult to accurately define hit quality as a single value such as play result. It is a good indicator of quality but a good hit can be defined by other metrics such as launch angle, a great field position, etc. and all of these qualities could be summed into a single metric. The idea is that features in the model will be weighted differently based on difficulty, frequency, and importance in scoring. The features themselves were also weighed based on their value. For example, bearing will have an individual weight in the model but also the options with bearing (left field, right field) will have a specific weight. The model is creating a new weighted metric that holistically represents ball quality rather than just the play result outcome.
Ultimately the features for this model were chosen based on a mix of effect on play result and survey information I received from baseball players.
I will be investigating several features and their relationships to each other as well as relationship to type of hit and result of the play. Features that influence quality of a hit and each other should be included in the model. Overall, the quality of a hit depends on if the ball makes if out of the infield and can result in a single, double, triple or HR. The features I will be exploring are bearing, pitch type, exit speed, distance, launch angle and hit type.
Field position plays a role in the likelihood of a play result. A play hit to left field is more likely to result in a single, double and a home run than in right field. Hits to left field are also less likely to receive an out than one hit to right field. There is not a significant difference between the other play results and field position. In summary, negative bearing should hold a higher weight than positive bearing within the model.
What makes a pitch difficult? Curveballs and sliders are notably the hardest pitches to hit because of the movement and ability to put some speed on the pitch. Using this data, I will first evaluate difficulty based on the frequency of pitch. Pitchers tend to throw what is the most effective. I will create inital weights using a weighted average by pitch type with adjustments due to baseball knowledge that breaking balls (curveballs and sliders) are some of the most difficult pitches to hit.
Four seams, or fastballs are the most frequent type of pitch thrown and are most likely to result in a foul ball or an out. Below are the percent of total weights for each pitch type.
Quality contact means that the ball is able to escape the infield. A ball that reaches the outfield is less likely to result in an out and more likely to result in a single, double or triple and of course a home run.
Exit speed is extremely important because it affects how far the ball is able to travel. It is important that the ball travels out of the infield because it is less likely to result in an out. There is a positive, linear correlation between exit speed and distance.
Since distance and exit speed are positively correlated, and higher distance means that the hit is more likely to result in an effective play such as an HR, single, double or triple. High exit speed is also related to these play results.
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2599 rows containing non-finite outside the scale range
## (`stat_smooth()`).